Techniques for Dealing with Missing Data in Knowledge Discovery Tasks
نویسنده
چکیده
Information plays a very important role in our life. Advances in many research fields depend on the ability of discovering knowledge in very large data bases. A lot of businesses base their success on the availability of marketing information. This kind of data is usually big, and not always easy to manage. Scientists from different research areas have developed methods to analyze huge amounts of data and to extract useful information. These methods may extract different kinds of knowledge, depending on the data and on user requirements. In particular, one important knowledge discovery task is supervised learning. Today, there exist many methods to build classifiers, belonging to different fields, such as artificial intelligence, soft computing, statistics. Unfortunately, traditional methods usually cannot deal directly with real-world data, because of missing or wrong items. This report concerns the former problem: the unavailability of some values. The majority of interesting data bases is incomplete, i.e., one or more values are missing inside some records, or some records are missing at all. There exist many techniques to manage data with missing items, but no one is absolutely better then the others. Different situations require different solutions. As Allison says, “the only really good solution to the missing data problem is not to have any” [1]. This report reviews the main missing data techniques (MDTs), trying to highlight their advantages and disadvantages. Next section introduces some terminology and presents a taxonomy of MDTs. Section 3 describes these methods more in detail. Finally, some conclusions are reported.
منابع مشابه
A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملThe Significance of the Missing Data Problem in Knowledge Discovery
Knowledge discovery in databases (KDD) is a field that is enjoying much attention in the literature and rapid growth in algorithms, techniques, and available software. KDD is defined as a non-trivial process which gleans valid, previously unknown, and potentially useful information from stored data [9]. One key area of KDD that has not kept up with this phenomenal growth is the area of data pre...
متن کاملKnowledge Discovery for Semantic Web
Knowledge Discovery is traditionally used for analysis of large amounts of data and enables addressing a number of tasks that arise in Semantic Web and require scalable solutions. Additionally, Knowledge Discovery techniques have been successfully applied not only to structured data i.e. databases but also to semi-structured and unstructured data including text, graphs, images and video. Semant...
متن کاملApplication of Rough Set Theory in Data Mining for Decision Support Systems (DSSs)
Decision support systems (DSSs) are prevalent information systems for decision making in many competitive business environments. In a DSS, decision making process is intimately related to some factors which determine the quality of information systems and their related products. Traditional approaches to data analysis usually cannot be implemented in sophisticated Companies, where managers ne...
متن کاملAn Enhanced Approach for Treating Missing Value using Boosted K-NN
Knowledge Discovery in Dataset (KDD) plays a vital role in information analysis and retrieval based applications. Quality of data is the most indispensable component of KDD. The factor which affects the quality of datasets is presence of missing values. The data collected from the real world often contains serious data quality troubles such as incomplete, redundant, inconsistent, and/or noisy d...
متن کامل